Getting Started¶
Contents¶
Setting up the python environment¶
python has been chosen as the main language of this book, for several reasons. It doesn’t have too much jargon, and the signal-to-noise ratio in code is very high (it’s almost pseudocode). While python is not the optimal language for scientific computing, it is familiar to most people with basic knowledge of programming, and the learning curve for python is not as steep as C++, Java, or Julia. One of the drawbacks associated with using python is that it doesn’t handle large graphs or datasets very well. C++ and Java both deal with larger structures more efficienly, and many authors of seminal papers in the field of GIS provide C++ and Java implementations alongside their papers. Julia, on the other hand, is a perfect midpoint between python and C++/Java. It maintains the readability of python but was developed specifically for scientific computing applications.
For the sake of accessibility, the example and exercises in this book will be based entirely in python. Implementations in other languages are welcome as pull requests. Simply make a PR of this repo and we’ll include any submissions that look promising.
From this point forward, we’ll assume that you already have python 3.6+ installed on your system. For installation instructions specific to your operating system, see this Beginner’s Guide.
There are two primary ways to install the required packages you’ll need: pip and conda. We recommend using pip as conda may have some adverse effects on system dependencies when used improperly in Linux.
Note
If you intend on developing in Windows, we recommend that you leverage the convenience of the Windows Subsystem for Linux (WSL). This allows you to use the full capabilities of Linux from within Windows. You can read more about setting up WSL here. After enabling WSL, you can proceed with the rest of this book as if you were operating in Linux.
Using pip3¶
Execute the following commands in your terminal:
$ sudo apt update
$ sudo apt install python3-pip
Install venv and create a python virtual environment:
$ sudo apt install python3.8-venv
$ mkdir <new directory for venv>
$ python3 -m venv <path to venv directory>
Make sure that you replace python3.8 with the version of python you are using.
$ python3 -m ensurepip --upgrade
venv in included with python3.8+. You can run the following command to create a virtual environment.
$ mkdir <new directory for venv>
$ python3 -m venv <path to venv directory>
You can now access your venv using the following command:
$ source <path to venv>/bin/activate
Using conda¶
Install conda for your OS using the guide found here.
You can create a conda environment like this:
$ conda create --name <name of env> python=<your version of python>
Access the newly-created environment using this command:
$ conda activate <your env name>
Installing Jupyter Notebook¶
All of the code in this book is stored in Jupyter Notebooks (.ipynb). To access these files directly, you have two options:
Open Binder or Google Colab using the icon on the top right of a page containing
pythoncode.Install Jupyter Notebook locally.
Jupyter Notebook can be installed as follows:
$ pip3 install jupyterlab
$ pip3 install notebook
$ conda install -c conda-forge jupyterlab
$ conda install -c conda-forge notebook
Getting the data¶
Most of the data used in this book will be sourced from OpenStreetMaps. For any other datasets that are not OSM-related, you can download the data in whatever format it is available in, and import it into python using the appropriate methods. Open Data websites hosted by various government bodies are a great source of data related to infrastructure, population metrics, and land use. See the Datasets page for some examples.
For OpenStreetMaps, there are two primary ways of retrieving the data:
Download the data as-is from OpenStreetMaps’ website and use tools like
osmfilterto tune it as needed. This is not recommended as it is more difficult and not very efficient.Use OpenStreetMaps’ API (Overpass API) to query for data. This filters the data on the fly and you only retrieve what you need. The API is accessible in both
Javaandpython.
Data Completeness
The data from OSM is not always “complete”. This doesn’t mean that there are major uncharted regions, but rather that neighbouring nodes are not always grouped correctly. For some nodes where there are feasible connections between them in real life, OSM still represents them as having no way or relation connecting them. This means that using the osmnx parser will result in the nodes being placed in separate graph components, which is not accurate to real-world conditions. We can use osrm to find routes between these kind of nodes and thus “complete” the graphs.
You can read more about the completeness of OpenStreetMaps data here:
Basic Operations¶
This section contains a non-exhaustive list of operations on geospatial data that you should familiarize yourself with. More information can be found by consulting the Tools and Python Libraries page or the respective libary’s API documentation.
Creating a Graph from a named place¶
osmnx can convert a text descriptor of a place into a networkx graph. Let’s use the University of Toronto as an example:
import osmnx
place_name = "University of Toronto"
# networkx graph of the named place
graph = osmnx.graph_from_address(place_name)
# Plot the graphs
osmnx.plot_graph(graph,figsize=(10,10))
(<Figure size 720x720 with 1 Axes>, <AxesSubplot:>)
The graph shows edges and nodes of the road network surrouding the University of Toronto’s St. George (downtown Toronto) campus. While it may look visually interesting, it extends a bit too far off campus, and we lack the context of the street names and other geographic features. Let’s restrict the scope of the network to 500 meters around the university, and use a folium map as a baselayer.
graph = osmnx.graph_from_address(place_name, dist=500)
osmnx.folium.plot_graph_folium(graph)
Nice! Suppose we want to get from one location to another on this campus.
Starting at the King Edward VII Equestrian Statue near Queen’s Park, we need to cut across the campus to attend a lecture at the Bahen Centre for Information Technology. Later in this section, we’ll see how we can calculate the shortest path between these two points.
Let’s plot these two locations on the map:
import folium
m = osmnx.folium.plot_graph_folium(graph,tiles='OpenStreetMap',zoom=16, color="#ff6865")
# UofT main building
center=(43.662643, -79.395689)
# King Edward VII Equestrian Statue
source_point = (43.664527, -79.392442)
# Bahen Centre for Information Technology at UofT
destination_point = (43.659659, -79.397669)
folium.Marker(location = center, popup = 'UofT main building', icon=folium.Icon(color='blue', icon='university', prefix='fa')).add_to(m)
folium.Marker(location=source_point, popup='King Edward VII Equestrian Statue', icon=folium.Icon(color='red', icon='camera', prefix='fa')).add_to(m)
folium.Marker(location=destination_point, popup='Bahen Centre for Information Technology at UofT', icon=folium.Icon(color='green', icon='info', prefix='fa')).add_to(m)
m
Extracting Graph Information¶
We can check various properties of the graph, such as the graph type, edge (road) types, CRS projection, etc.
Graph Type¶
type(graph)
networkx.classes.multidigraph.MultiDiGraph
Note
You can read more about MultiDiGraph here.
Edges and Nodes¶
We can extract the nodes and edges of the graph as separate structures.
nodes, edges = osmnx.graph_to_gdfs(graph)
nodes.head(5)
| y | x | highway | street_count | geometry | |
|---|---|---|---|---|---|
| osmid | |||||
| 20964579 | 43.667513 | -79.399806 | traffic_signals | 4 | POINT (-79.39981 43.66751) |
| 20979738 | 43.665937 | -79.391876 | traffic_signals | 4 | POINT (-79.39188 43.66594) |
| 20979740 | 43.666067 | -79.392493 | NaN | 3 | POINT (-79.39249 43.66607) |
| 20979742 | 43.666368 | -79.393159 | NaN | 3 | POINT (-79.39316 43.66637) |
| 20979743 | 43.665877 | -79.393329 | NaN | 3 | POINT (-79.39333 43.66588) |
edges.head(5)
| osmid | lanes | name | highway | maxspeed | oneway | length | geometry | access | service | tunnel | bridge | width | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| u | v | key | |||||||||||||
| 20964579 | 389677900 | 0 | 3998177 | 3 | Bloor Street West | primary | 50 | False | 8.362 | LINESTRING (-79.39981 43.66751, -79.39971 43.6... | NaN | NaN | NaN | NaN | NaN |
| 390548868 | 0 | 4212261 | 2 | St George Street | tertiary | 30 | False | 7.306 | LINESTRING (-79.39981 43.66751, -79.39978 43.6... | NaN | NaN | NaN | NaN | NaN | |
| 389678082 | 0 | 5090496 | 2 | St George Street | residential | 40 | False | 12.405 | LINESTRING (-79.39981 43.66751, -79.39986 43.6... | NaN | NaN | NaN | NaN | NaN | |
| 389677899 | 0 | 234365457 | 2 | Bloor Street West | primary | 50 | False | 9.046 | LINESTRING (-79.39981 43.66751, -79.39991 43.6... | NaN | NaN | NaN | NaN | NaN | |
| 20979738 | 2144928236 | 0 | 7685256 | NaN | NaN | footway | NaN | False | 9.967 | LINESTRING (-79.39188 43.66594, -79.39182 43.6... | NaN | NaN | NaN | NaN | NaN |
We can further drill down to examine each individual node or edge.
# Rendering the 2nd node
list(graph.nodes(data=True))[1]
(20979738,
{'y': 43.6659368,
'x': -79.3918763,
'highway': 'traffic_signals',
'street_count': 4})
# Rendering the 1st edge
list(graph.edges(data=True))[0]
(20964579,
389677900,
{'osmid': 3998177,
'lanes': '3',
'name': 'Bloor Street West',
'highway': 'primary',
'maxspeed': '50',
'oneway': False,
'length': 8.362})
Street Types¶
We can also get a summary of the street types in our graph.
print(edges['highway'].value_counts())
footway 2030
service 418
residential 186
tertiary 107
path 82
secondary 79
[steps, footway] 58
pedestrian 36
unclassified 24
primary 22
steps 20
[footway, service] 12
secondary_link 10
cycleway 6
[footway, path] 4
[steps, path] 4
[steps, path, service] 4
[corridor, steps, footway] 2
Name: highway, dtype: int64
GeoDataFrame to MultiDiGraph¶
GeoDataFrames can be easily converted back to MultiDiGraphs by using osmnx.graph_from_gdfs.
new_graph = osmnx.graph_from_gdfs(nodes,edges)
osmnx.plot_graph(new_graph,figsize=(10,10))
(<Figure size 720x720 with 1 Axes>, <AxesSubplot:>)
Calculating Network Statistics¶
We can see some basic statistics of our graph using osmnx.basic_stats.
osmnx.basic_stats(graph)
{'n': 1090,
'm': 3104,
'k_avg': 5.695412844036698,
'edge_length_total': 92073.89100000016,
'edge_length_avg': 29.6629803479382,
'streets_per_node_avg': 3.053211009174312,
'streets_per_node_counts': {0: 0, 1: 154, 2: 0, 3: 579, 4: 349, 5: 7, 6: 1},
'streets_per_node_proportions': {0: 0.0,
1: 0.14128440366972478,
2: 0.0,
3: 0.5311926605504587,
4: 0.3201834862385321,
5: 0.006422018348623854,
6: 0.0009174311926605505},
'intersection_count': 936,
'street_length_total': 48561.08599999995,
'street_segment_count': 1613,
'street_length_avg': 30.106066955982612,
'circuity_avg': 1.0297899024310946,
'self_loop_proportion': 0.0012399256044637321}
We can also see the circuity average. Circuity average is the sum of edge lengths divided by the sum of straight line distances. It produces a metric > 1 that indicates how “direct” the network is (i.e. how much extra distance must be travelled using the graph).
# osmnx expects an undirected graph
undir = graph.to_undirected()
osmnx.stats.circuity_avg(undir)
1.0328466941401129
Extended and Density Stats¶
ensity-based statistics requires knowing the area of the graph. To do this, we can first find the convex hull. Let’s also look at some extended statistics.
convex_hull = edges.unary_union.convex_hull
convex_hull
import pandas
area = convex_hull.area
stats = osmnx.basic_stats(graph, area=area)
extended_stats = osmnx.extended_stats(graph,ecc=True,cc=True)
stats.update(extended_stats)
# Show the data in a pandas Series for better formatting
pandas.Series(stats)
/home/yinan/book-env/lib/python3.8/site-packages/osmnx/stats.py:405: UserWarning: The extended_stats function has been deprecated and will be removed in a future release. Use NetworkX directly for extended topological measures.
warnings.warn(msg)
n 1090
m 3104
k_avg 5.695413
edge_length_total 92073.891
edge_length_avg 29.66298
streets_per_node_avg 3.053211
streets_per_node_counts {0: 0, 1: 154, 2: 0, 3: 579, 4: 349, 5: 7, 6: 1}
streets_per_node_proportions {0: 0.0, 1: 0.14128440366972478, 2: 0.0, 3: 0....
intersection_count 936
street_length_total 48561.086
street_segment_count 1613
street_length_avg 30.106067
circuity_avg 1.02979
self_loop_proportion 0.00124
node_density_km 10289596259367.783203
intersection_density_km 8835836787860.775391
edge_density_km 869177215063338.25
street_density_km 458416485189391.5
avg_neighbor_degree {20964579: 3.75, 20979738: 3.0, 20979740: 2.5,...
avg_neighbor_degree_avg 3.126391
avg_weighted_neighbor_degree {20964579: 0.4041057140547967, 20979738: 0.129...
avg_weighted_neighbor_degree_avg 0.18964
degree_centrality {20964579: 0.0073461891643709825, 20979738: 0....
degree_centrality_avg 0.00523
clustering_coefficient {20964579: 0, 20979738: 0, 20979740: 0, 209797...
clustering_coefficient_avg 0.041101
clustering_coefficient_weighted {20964579: 0, 20979738: 0, 20979740: 0, 209797...
clustering_coefficient_weighted_avg 0.005141
pagerank {20964579: 0.0005297849444390717, 20979738: 0....
pagerank_max_node 390545025
pagerank_max 0.003326
pagerank_min_node 306721057
pagerank_min 0.000138
eccentricity {20964579: 1391.2359999999999, 20979738: 1330....
diameter 1744.009
radius 878.559
center [2143404200]
periphery [2143499057]
closeness_centrality {20964579: 0.0013961273083024036, 20979738: 0....
closeness_centrality_avg 0.00153
dtype: object
Warning
extended_stats has already been deprecated from osmnx. In the absence of an alternative, you can access these metrics directly through networkx.
CRS Projection¶
You can also look at the projection of the graph. To find out more about projections, check out this section. Additionally, you can also reproject the graph to a different CRS.
edges.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
merc_edges = edges.to_crs(epsg=3857)
merc_edges.crs
<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World between 85.06°S and 85.06°N.
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
Shortest Path Analysis¶
Let’s revisit our trip across campus from the statue in Queen’s Park to our lecture hall at the Bahen centre.
To calculate the shortest path, we first need to find the closest nodes on the network to our starting and ending locations.
import geopandas
X = [source_point[1], destination_point[1]]
Y = [source_point[0], destination_point[0]]
closest_nodes = osmnx.distance.nearest_nodes(graph,X,Y)
# Get the rows from the Node GeoDataFrame
closest_rows = nodes.loc[closest_nodes]
# Put the two nodes into a GeoDataFrame
od_nodes = geopandas.GeoDataFrame(closest_rows, geometry='geometry', crs=nodes.crs)
od_nodes
| y | x | highway | street_count | geometry | |
|---|---|---|---|---|---|
| osmid | |||||
| 1907446268 | 43.664507 | -79.392304 | NaN | 3 | POINT (-79.39230 43.66451) |
| 1633421938 | 43.659750 | -79.398047 | NaN | 1 | POINT (-79.39805 43.65975) |
Let’s find and plot the shortest route now!
import networkx
shortest_route = networkx.shortest_path(G=graph,source=closest_nodes[0],target=closest_nodes[1], weight='length')
print(shortest_route)
[1907446268, 55808205, 55808194, 1907446267, 8699033082, 8699033084, 6542457312, 4953810914, 55808233, 299625330, 389677953, 7967019556, 7967019566, 4923076695, 55808571, 55808582, 389678112, 389678113, 389678146, 2143434862, 2143434860, 7311083158, 1258707987, 389678121, 50885147, 389678122, 389677906, 50885141, 389678180, 2143436415, 2143494216, 2143494214, 389678185, 1633421950, 389678184, 389678183, 389678216, 389678215, 389678226, 1633421933, 1633421938]
osmnx.plot_graph_route(graph,shortest_route,figsize=(15,15))
(<Figure size 1080x1080 with 1 Axes>, <AxesSubplot:>)